Identifying Sections in Scientific Abstracts using Conditional Random Fields
نویسندگان
چکیده
OBJECTIVE: The prior knowledge about the rhetorical structure of scientific abstracts is useful for various text-mining tasks such as information extraction, information retrieval, and automatic summarization. This paper presents a novel approach to categorize sentences in scientific abstracts into four sections, objective, methods, results, and conclusions. METHOD: Formalizing the categorization task as a sequential labeling problem, we employ Conditional Random Fields (CRFs) to annotate section labels into abstract sentences. The training corpus is acquired automatically from Medline abstracts. RESULTS: The proposed method outperformed the previous approaches, achieving 95.5% per-sentence accuracy and 68.8% per-abstract accuracy. CONCLUSION: The experimental results showed that CRFs could model the rhetorical structure of abstracts more suitably.
منابع مشابه
Using conditional random fields for result identification in biomedical abstracts
The abstracts of biomedical papers usually contain three sections: objective, methods, and results-conclusion. The results-conclusion section is the most important because it usually describes the main contribution of a paper. Unfortunately, not all biomedical journals follow this three-section format. In this paper, we propose a machine learning (ML) based approach to automatically identify th...
متن کاملReference String Extraction Using Line-Based Conditional Random Fields
The extraction of individual reference strings from the reference section of scientific publications is an important step in the citation extraction pipeline. Current approaches divide this task into two steps by first detecting the reference section areas and then grouping the text lines in such areas into reference strings. We propose a classification model that considers every line in a publ...
متن کاملRelationship Extraction from Biomedical Documents using Conditional Random Fields
Extracting complex relationships automatically from unstructured information resources is a challenging problem. It is an important problem in this present age of abundant machine processable information as there is a need to build intelligent knowledge-aware applications for tasks such search, extraction and reasoning. We have used Conditional Random Fields (CRFs) to identify various relations...
متن کاملA comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature
BACKGROUND Chemical compounds and drugs (together called chemical entities) embedded in scientific articles are crucial for many information extraction tasks in the biomedical domain. However, only a very limited number of chemical entity recognition systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of chemical entity r...
متن کاملIdentifying treatments, groups and outcomes in medical abstracts
Detecting and extracting treatments, treatment groups and outcomes is a key step in generating summaries of medical research papers. We describe initial results in applying namedentity recognition methods to the task of extracting such entities from BMJ abstracts. Results are promising, showing that a conditional random field approach using word and semantic features appears to be more useful f...
متن کامل